IN THE CLAIMS : 

Please amend claims 1, 7, 8, 10, 26, 30-36, and 38-41. Please cancel claim 37. All pending 
claims are reproduced below. 

1. (Currently Amended) An apparatus for direct annotation of objects, the apparatus 
comprising: 

a display device for displaying one or more images; 
an audio input device for receiving an audio input; and 

a direct annotation creation module coupled to receive an input audio signal from t e the 
audio input device and to receive a reference to an image from t he display device, 
the direct annotation creation module , in response to receiving the audio input 
signal and the reference to the image, automatically creating an annotation objects 
independent from the image, that associates the an input audio signal with a n the 
image display e d on th e display devic e. 

2. (Original) The apparatus of claim 1 further comprising an annotation display module 
coupled to the direct annotation creation module, the annotation display module generating 
symbols or text representing the annotation objects. 

3. (Original) The apparatus of claim 1 further comprising an annotation audio output 
module coupled to the direct annotation creation module, the annotation audio output module 
generating audio output in response to user selection of an annotation symbol representing an 
annotation object. 

4. (Original) The apparatus of claim 1 further comprising: 
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an audio vocabulary storage for storing a plurality of audio signals and corresponding text 
strings; 

an audio vocabulary comparison module coupled to the audio input device, the audio 
vocabulary storage and the direct annotation creation module, the audio 
vocabulary comparison module receiving audio input and finding a corresponding 
text string that matches the audio input; and 

wherein the direct annotation creation module uses text strings found by the audio 
vocabulary comparison module to create the audio annotation. 

5. (Original) The apparatus of claim 1 further comprising: 

an audio vocabulary storage for storing a plurality of audio signals and corresponding text 
strings; 

a dynamic vocabulary updating module coupled to the audio vocabulary storage and the 
audio input device, the dynamic vocabulary updating module for displaying an 
interface to create a new entry in the audio vocabulary storage, the dynamic 
vocabulary updating module receiving an audio input and a text string and 
creating the new entry in the audio vocabulary storage. 

6. (Original) The apparatus of claim 1 further comprising a media object cache for 
storing media and annotation objects. 

7. (Currently Amended) An apparatus for direct annotation of objects for use with a 
system for storing, accessing, and presenting objects such as video objects, text objects, audio 
objects, or image objects , the apparatus comprising: 
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a direct annotation creation module coupled to receive an input audio signal and a 
reference to an image, the direct annotation creation module creating an 
annotation objec t independent of the image, that associates a symbol or text with 
the image; and 

an annotation display module coupled to the direct annotation creation module, the 
annotation display module generating the symbol or text representing the 
annotation object on a display device. 

8. (Currently Amended) An apparatus for direct annotation of objects for use with a 
system for storing, accessing, and presenting objects such as video objects, text objects, audio 
objects, or image objects , the apparatus comprising: 

a direct annotation creation module coupled to receive an input audio signal and a 
reference to an image, the direct annotation creation module creating an 
annotation objec t, independent of the image, that associates the input audio signal 
and the image , the annotation object including at least an audio input field, an 
image reference field, and an annotation location field ; and 

an annotation audio output module coupled to the direct annotation creation module, the 
annotation audio output module generating audio output in response to user 
selection of an annotation symbol representing the annotation object. 

9. (Currently Amended) An apparatus for direct annotation of objects, the apparatus 
comprising: 

a media object storage for storing media and annotation objects; and 
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a direct annotation creation module coupled to receive an input audio signal and a 
reference to an image, the direct annotation creation module creating an 
annotation object , independent of the image, that associates the input audio signal 
and the image, the direct annotation creation module storing the audio annotation 
in the media object storage. 

1 0. (Currently Amended) A computer implemented method for direct annotation of 
objects, the method comprising the steps of: 

displaying an image; 

receiving audio input; 

detecting selection of an image; and 

creating an annotation object, independent of the selected image, between the selected 
image and the audio input , the annotation object including at least an audio input 
field, an image reference field, and an annotation location field . 

11. (Original) The method of claim 10, wherein the step of displaying is performed before 
or simultaneously with the step of receiving. 

12. (Original) The method of claim 10, wherein the step of receiving is performed before 
or simultaneously with the step of displaying. 

13. (Original) The method of claim 10, wherein the step of detecting selection includes 
detecting a portion of the image; and wherein the annotation creates an association between the 
portion of the image and the audio input. 
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14. (Original) The method of claim 10, further comprising the step of displaying a visual 
notation that the image has an annotation. 

15. (Original) The method of claim 14, wherein the visual notation is text or a symbol. 

16. (Original) The method of claim 10, wherein the step of creating an annotation 
includes creating an annotation object and storing the annotation object in an object storage. 

17. (Original) The method of claim 10, further comprising the step of recording the audio 
input received. 

18. (Original) The method of claim 17, wherein the step of creating an annotation 
includes creating an annotation object and storing the recorded audio input as part of the 
annotation object. 

19. (Original) The method of claim 10, further comprising the step of comparing the 
audio input to a vocabulary to produce text. 

20. (Original) The method of claim 19, wherein the step of creating an annotation 
includes creating an annotation object and storing the text as part of the annotation object. 

21. (Original) The method of claim 10, further comprising the steps of: 
comparing the audio input to a vocabulary; 

determining if the audio input has a matching entry in the vocabulary; and 
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storing the entry as part of the annotation object if the audio input has a matching entry in 
the vocabulary. 

22. (Original) The method of claim 21, further comprising the steps of: 
determining if the audio input has a close match in the vocabulary; 
displaying the close matches; 

receiving input selecting a close match; and 

storing the selected close match as part of the annotation object if the audio input has a 
close match in the vocabulary. 

23. (Original) The method of claim 22, further comprising the step of displaying a 
message that the image has not been annotated if there is neither a matching entry in the 
vocabulary nor a close match in the vocabulary. 

24. (Original) The method of claim 22, further comprising the following steps if there is 
neither a matching entry in the vocabulary nor a close match in the vocabulary: 

receiving text input corresponding to the audio input; 

updating the vocabulary with a new entry including the audio input and the text input; and 
wherein the received text is stored as part of the annotation object. 

25. (Original) The method of claim 10, further comprising the steps of: 
receiving text input corresponding to the audio input; 

updating the vocabulary with a new entry including the audio input and the text input. 
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26. (Currently Amended) A computer implemented method for direct annotation of 
objects, the method comprising the steps of: 

displaying an image; 
receiving audio input; 
detecting selection of an image; 

comparing the audio input to a vocabulary to produce text; and 

creating an annotation object, independent of the selected image, between the selected 

image and the text , the annotation object including at least a text annotation field, 

an image reference field, and an annotation location field . 

27. (Original) The method of claim 26, further comprising the step of recording the audio 
input received. 

28. (Original) The method of claim 27, wherein the step of creating an annotation 
includes creating an annotation object including a reference to the selected image, the recorded 
audio input and the text, and storing the annotation object in an object storage. 

29. (Original) The method of claim 26, wherein the step of creating an annotation 
includes creating an annotation object and storing the text as part of the annotation object. 

30. (Original) The method of claim 26, further comprising the steps of: 
determining if the audio input has a matching entry in the vocabulary; and 

storing the entry as part of the annotation object if the audio input has a matching entry in 
the vocabulary. 
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313Q. (Currently Amended) The method of claim 29, further comprising the steps of: 

determining if the audio input has a close match in the vocabulary; 

displaying the close matches; 

receiving input selecting a close match; and 

storing the selected close match as part of the annotation object if the audio input has a 
close match in the vocabulary. 

34-32. (Currently Amended) The method of claim 30, further comprising the step of 
displaying a message that the image has not been annotated if there is neither a matching entry in 
the vocabulary nor a close match in the vocabulary. 

3233. (Currently Amended) The method of claim 30, further comprising the following 
steps if there is neither a matching entry in the vocabulary nor a close match in the vocabulary: 
receiving text input corresponding to the audio input; 

updating the vocabulary with a new entry including the audio input and the text input; and 
wherein the received text is stored as part of the annotation object. 

3334- (Currently Amended) The method of claim 26, further comprising the steps of: 
receiving text input corresponding to the audio input; 

updating the vocabulary with a new entry including the audio input and the text input. 

3435. (Currently Amended) A computer implemented method for displaying objects with 
annotations, the method comprising the steps of: 
retrieving an image; 

displaying the image with a visual notation that an annotation ems texists ; 
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receiving user selection of an image; and 
outputting a notation associated with the selected image; 
determining whether the annotation includes text; 
retrieving a text annotation for the selected image; and 
displaying the retrieved text with the image . 

5£36. (Currently Amended) The method of claim 5435, wherein the annotation is text and 
the step of outputting is displaying the text proximate an image that it annotates. 

5637. (Currently Amended) The method of claim 3435, wherein the annotation is an 
audio signal and the step of outputting is playing the audio signal. 

37. (Canceled) 

5*39. (Currently Amended) The method of claim 5435, further comprising the steps of: 
determining whether the annotation includes an audio signal; 
retrieving a audio signal for the selected image; and 
wherein the step of outputting is playing the audio signal. 

5940. (Currently Amended) A computer implemented m ethod for retrieving images, the 
method comprising the steps of: I 
receiving audio input; 

determining annotation objects that reference a close match to the audio input; 
retrieving the images that are referenced by the determined annotation objects; and 
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displaying the retrieved images , the annotation object including at least an audio input 
field, an image reference field, and an annotation location field . 

4041. (Currently Amended) The method of claim 5940, wherein the step of determining 
annotation objects further comprising the steps of: 

comparing the audio input to an audio signal reference by an annotation object; and 
determining a close match between the audio input to the audio signal reference by an 
annotation object if a probability metric is greater than aa-a threshold of 80%. 

4442. (Currently Amended) The method of claim 3940, wherein the step of determining 
annotation objects further comprising the steps of: 

determining the annotation objects for a plurality of images; 

for each annotation object, comparing the audio input to an audio signal reference by an 

annotation object; and 
determining a close match between the audio input to the audio signal reference by an 

annotation object if a probability metric is greater than an a threshold of 80%. 
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